47 research outputs found

    Designing seeds for similarity search in genomic DNA

    Get PDF
    AbstractLarge-scale comparison of genomic DNA is of fundamental importance in annotating functional elements of genomes. To perform large comparisons efficiently, BLAST (Methods: Companion Methods Enzymol 266 (1996) 460, J. Mol. Biol. 215 (1990) 403, Nucleic Acids Res. 25(17) (1997) 3389) and other widely used tools use seeded alignment, which compares only sequences that can be shown to share a common pattern or “seed’’ of matching bases. The literature suggests that the choice of seed substantially affects the sensitivity of seeded alignment, but designing and evaluating seeds is computationally challenging.This work addresses the problem of designing a seed to optimize performance of seeded alignment. We give a fast, simple algorithm based on finite automata for evaluating the sensitivity of a seed in a Markov model of ungapped alignments, along with extensions to mixtures and inhomogeneous Markov models. We give intuition and theoretical results on which seeds are good choices. Finally, we describe Mandala, a software tool for seed design, and show that it can be used to improve the sensitivity of alignment in practice

    Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs

    Get PDF
    BACKGROUND: Identification of specific genes and gene expression patterns important for bacterial survival, transmission and pathogenesis is critically needed to enable development of more effective pathogen control strategies. The stationary phase stress response transcriptome, including many σ(B)-dependent genes, was defined for the human bacterial pathogen Listeria monocytogenes using RNA sequencing (RNA-Seq) with the Illumina Genome Analyzer. Specifically, bacterial transcriptomes were compared between stationary phase cells of L. monocytogenes 10403S and an otherwise isogenic ΔsigB mutant, which does not express the alternative σ factor σ(B), a major regulator of genes contributing to stress response, including stresses encountered upon entry into stationary phase. RESULTS: Overall, 83% of all L. monocytogenes genes were transcribed in stationary phase cells; 42% of currently annotated L. monocytogenes genes showed medium to high transcript levels under these conditions. A total of 96 genes had significantly higher transcript levels in 10403S than in ΔsigB, indicating σ(B)-dependent transcription of these genes. RNA-Seq analyses indicate that a total of 67 noncoding RNA molecules (ncRNAs) are transcribed in stationary phase L. monocytogenes, including 7 previously unrecognized putative ncRNAs. Application of a dynamically trained Hidden Markov Model, in combination with RNA-Seq data, identified 65 putative σ(B )promoters upstream of 82 of the 96 σ(B)-dependent genes and upstream of the one σ(B)-dependent ncRNA. The RNA-Seq data also enabled annotation of putative operons as well as visualization of 5'- and 3'-UTR regions. CONCLUSIONS: The results from these studies provide powerful evidence that RNA-Seq data combined with appropriate bioinformatics tools allow quantitative characterization of prokaryotic transcriptomes, thus providing exciting new strategies for exploring transcriptional regulatory networks in bacteria. See minireivew http://jbiol.com/content/8/12/107

    Novel features of ARS selection in budding yeast Lachancea kluyveri

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The characterization of DNA replication origins in yeast has shed much light on the mechanisms of initiation of DNA replication. However, very little is known about the evolution of origins or the evolution of mechanisms through which origins are recognized by the initiation machinery. This lack of understanding is largely due to the vast evolutionary distances between model organisms in which origins have been examined.</p> <p>Results</p> <p>In this study we have isolated and characterized autonomously replicating sequences (ARSs) in <it>Lachancea kluyveri </it>- a pre-whole genome duplication (WGD) budding yeast. Through a combination of experimental work and rigorous computational analysis, we show that <it>L. kluyveri </it>ARSs require a sequence that is similar but much longer than the ARS Consensus Sequence well defined in <it>Saccharomyces cerevisiae</it>. Moreover, compared with <it>S. cerevisiae </it>and <it>K. lactis</it>, the replication licensing machinery in <it>L. kluyveri </it>seems more tolerant to variations in the ARS sequence composition. It is able to initiate replication from almost all <it>S. cerevisiae </it>ARSs tested and most <it>Kluyveromyces lactis </it>ARSs. In contrast, only about half of the <it>L. kluyveri </it>ARSs function in <it>S. cerevisiae </it>and less than 10% function in <it>K. lactis</it>.</p> <p>Conclusions</p> <p>Our findings demonstrate a replication initiation system with novel features and underscore the functional diversity within the budding yeasts. Furthermore, we have developed new approaches for analyzing biologically functional DNA sequences with ill-defined motifs.</p

    A Comprehensive Genome-Wide Map of Autonomously Replicating Sequences in a Naive Genome

    Get PDF
    Eukaryotic chromosomes initiate DNA synthesis from multiple replication origins. The machinery that initiates DNA synthesis is highly conserved, but the sites where the replication initiation proteins bind have diverged significantly. Functional comparative genomics is an obvious approach to study the evolution of replication origins. However, to date, the Saccharomyces cerevisiae replication origin map is the only genome map available. Using an iterative approach that combines computational prediction and functional validation, we have generated a high-resolution genome-wide map of DNA replication origins in Kluyveromyces lactis. Unlike other yeasts or metazoans, K. lactis autonomously replicating sequences (KlARSs) contain a 50 bp consensus motif suggestive of a dimeric structure. This motif is necessary and largely sufficient for initiation and was used to dependably identify 145 of the up to 156 non-repetitive intergenic ARSs projected for the K. lactis genome. Though similar in genome sizes, K. lactis has half as many ARSs as its distant relative S. cerevisiae. Comparative genomic analysis shows that ARSs in K. lactis and S. cerevisiae preferentially localize to non-syntenic intergenic regions, linking ARSs with loci of accelerated evolutionary change

    Finding motifs in the twilight zone

    No full text
    We introduce the notion of a multiprofile and use it for finding subtle motifs in DNA sequences. Multiprofiles generalize the notion of a profile and allow one to detect subtle consensus sequences that escape detection by the standard profiles. Our MULTIPROFILER algorithm outperforms other leading motif finding algorithms in a number of synthetic models. Moreover, it can be shown that in some previously studied motif models, MULTIPROFILER is capable of pushing the performance envelope to its theoretical limits. 1
    corecore